22 research outputs found
Statistical Analysis of Fractal Image Coding and Fixed Size Partitioning Scheme
Fractal Image Compression (FIC) is a state of the art technique used for high compression ratio. But it lacks behind in its encoding time requirements. In this method an image is divided into non-overlapping range blocks and overlapping domain blocks. The total number of domain blocks is larger than the range blocks. Similarly the sizes of the domain blocks are twice larger than the range blocks. Together all domain blocks creates a domain pool. A range block is compared with all possible domains block for similarity measure. So the domain is decimated for a proper domainrange comparison. In this paper a novel domain pool decimation and reduction technique has been developed which uses the median as a measure of the central tendency instead of the mean (or average) of the domain pixel values. However this process is very time consuming
LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks
The activation function in neural network is one of the important aspects
which facilitates the deep training by introducing the non-linearity into the
learning process. However, because of zero-hard rectification, some of the
existing activation functions such as ReLU and Swish miss to utilize the large
negative input values and may suffer from the dying gradient problem. Thus, it
is important to look for a better activation function which is free from such
problems. As a remedy, this paper proposes a new non-parametric function,
called Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs).
The proposed LiSHT activation function is an attempt to scale the non-linear
Hyperbolic Tangent (Tanh) function by a linear function and tackle the dying
gradient problem. The training and classification experiments are performed
over benchmark Iris, MNIST, CIFAR10, CIFAR100 and twitter140 datasets to show
that the proposed activation achieves faster convergence and higher
performance. A very promising performance improvement is observed on three
different type of neural networks including Multi-layer Perceptron (MLP),
Convolutional Neural Network (CNN) and Recurrent neural network like Long-short
term memory (LSTM). The advantages of proposed activation function are also
visualized in terms of the feature activation maps, weight distribution and
loss landscape. The code is available at https://github.com/swalpa/lisht.Comment: Submitted to IET Image Processin
Deep Hyperspectral Unmixing using Transformer Network
Currently, this paper is under review in IEEE. Transformers have intrigued
the vision research community with their state-of-the-art performance in
natural language processing. With their superior performance, transformers have
found their way in the field of hyperspectral image classification and achieved
promising results. In this article, we harness the power of transformers to
conquer the task of hyperspectral unmixing and propose a novel deep unmixing
model with transformers. We aim to utilize the ability of transformers to
better capture the global feature dependencies in order to enhance the quality
of the endmember spectra and the abundance maps. The proposed model is a
combination of a convolutional autoencoder and a transformer. The hyperspectral
data is encoded by the convolutional encoder. The transformer captures
long-range dependencies between the representations derived from the encoder.
The data are reconstructed using a convolutional decoder. We applied the
proposed unmixing model to three widely used unmixing datasets, i.e., Samson,
Apex, and Washington DC mall and compared it with the state-of-the-art in terms
of root mean squared error and spectral angle distance. The source code for the
proposed model will be made publicly available at
\url{https://github.com/preetam22n/DeepTrans-HSU}.Comment: Currently, this paper is under review in IEE
Multimodal Fusion Transformer for Remote Sensing Image Classification
Vision transformer (ViT) has been trending in image classification tasks due
to its promising performance when compared to convolutional neural networks
(CNNs). As a result, many researchers have tried to incorporate ViT models in
hyperspectral image (HSI) classification tasks, but without achieving
satisfactory performance. To this paper, we introduce a new multimodal fusion
transformer (MFT) network for HSI land-cover classification, which utilizes
other sources of multimodal data in addition to HSI. Instead of using
conventional feature fusion techniques, other multimodal data are used as an
external classification (CLS) token in the transformer encoder, which helps
achieving better generalization. ViT and other similar transformer models use a
randomly initialized external classification token {and fail to generalize
well}. However, the use of a feature embedding derived from other sources of
multimodal data, such as light detection and ranging (LiDAR), offers the
potential to improve those models by means of a CLS. The concept of
tokenization is used in our work to generate CLS and HSI patch tokens, helping
to learn key features in a reduced feature space. We also introduce a new
attention mechanism for improving the exchange of information between HSI
tokens and the CLS (e.g., LiDAR) token. Extensive experiments are carried out
on widely used and benchmark datasets i.e., the University of Houston, Trento,
University of Southern Mississippi Gulfpark (MUUFL), and Augsburg. In the
results section, we compare the proposed MFT model with other state-of-the-art
transformer models, classical CNN models, as well as conventional classifiers.
The superior performance achieved by the proposed model is due to the use of
multimodal information as external classification tokens
WetMapFormer: A unified deep CNN and vision transformer for complex wetland mapping
The Ramsar Convention of 1971 encourages wetland preservation, but it is unclear how climate change will affect wetland extent and related biodiversity. Due to the use of the self-attention mechanism, vision transformers (ViTs) gain better modeling of global contextual information and become a powerful alternative to Convolutional Neural Networks (CNNs). However, ViTs require enormous training datasets to activate their image classification power, and gathering training samples for remote sensing applications is typically costly. As such, in this study, we develop a deep learning algorithm called (WetMapFormer), which effectively utilizes both CNNs and vision transformer architectures for precise mapping of wetlands in three pilot sites around the Albert county, York county, and Grand Bay-Westfield located in New Brunswick, Canada. The WetMapFormer utilizes local window attention (LWA) rather than the conventional self-attention mechanism for improving the capability of feature generalization in a local area by substantially reducing the computational cost of vanilla ViTs. We extensively evaluated the robustness of the proposed WetMapFormer with Sentinel-1 and Sentinel-2 satellite data and compared it with the various CNNs and vision transformer models which include ViT, Swin Transformer, HybridSN, CoAtNet, a multimodel network, and ResNet, respectively. The proposed WetMapFormer achieves F-1 scores of 0.94, 0.94, 0.96, 0.97, 0.97, 0.97, and 1 for the recognition of aquatic bed, freshwater marsh, shrub wetland, bog, fen, forested wetland, and water, respectively. As compared to other vision transformers, the WetMapFormer limits receptive fields while adjusting translational invariance and equivariance characteristics. The codes will be made available publicly at https://github.com/aj1365/WetMapFormer